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A test of the null hypothesis that a hazard rate is monotone non- 
decreasing, versus the alternative that it is not, is proposed. Both 
the test statistic and the means of calibrating it are new. Unlike pre- 
vious approaches, neither is based on the assumption that the null 
distribution is exponential. Instead, empirical information is used to 
effectively identify and eliminate from further consideration parts of 
the line where the hazard rate is clearly increasing; and to confine 
subsequent attention only to those parts that remain. This produces a 
test with greater apparent power, without the excessive conservatism 
of exponential-based tests. Our approach to calibration borrows from 
ideas used in certain tests for unimodality of a density, in that a band- 
width is increased until a distribution with the desired properties is 
obtained. However, the test statistic does not involve any smoothing, 
and is, in fact, based directly on an assessment of convexity of the 
distribution function, using the conventional empirical distribution. 
The test is shown to have optimal power properties in difficult cases, 
where it is called upon to detect a small departure, in the form of a 
bump, from monotonicity. More general theoretical properties of the 
test and its numerical performance are explored. 

1. Introduction. Estimation of a hazard rate under the hypothesis that 
it is nondecreasing, and testing the validity of this assumption, are motivated 
by problems where failure rate of a machine part or a biological system can 
be expected to increase with lifetime. If for some reason a machine part 
becomes more reliable with time over at least part of its life cycle, then it 
can be particularly important to know that fact. The knowledge may lead 
to changes in the way the part is manufactured or finished, so as to remove 
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the requirement for a running-in period where failure is relatively likely to 
occur. In this paper we suggest a new test statistic of the null hypothesis of 
monotone nondecreasing failure rate and a new approach to calibrating the 
distribution of the statistic so as to determine a critical point for the test. 

Our methods confer two advantages relative to existing approaches. First, 
our test statistic is focused on relatively "local" departures from the null 
hypothesis of nondecreasing hazard rate, and pays relatively little attention 
to those parts of the sample space where the hazard rate is indeed monotone 
nondecreasing. Nevertheless, the method is easily localized still further, since 
it focuses on variation of the hazard rate over an interval which can be 
increased or decreased at the investigator's discretion, or, indeed, replaced 
by the union of two or more intervals. 

Second, our new method of calibration makes the test statistic much more 
sensitive to relatively small departures from the null hypothesis. For a given 
nominal probability of rejection, our calibration approach produces a test 
with greater apparent power than do standard methods based on calibration 
by comparison with the exponential distribution. The reason is that the 
exponential case is particularly awkward to detect; the corresponding hazard 
rate is perfectly flat, and, therefore, to avoid incorrectly rejecting the null 
hypothesis in this case, the test statistic has to satisfy itself that there are 
no significant bumps on a perfectly flat line. In consequence, the test tends 
to overlook small bumps, for fear of committing a Type I error, and so has 
relatively low power against hazard rates that are nondecreasing except for 
small bumps. 

The test we propose has substantially greater apparent power in so-called 
"difficult cases" (cf. [7]) than does, for example, Proschan and Pyke's [19] 
test, calibrated using the exponential distribution. Indeed, we shall prove 
that our method has optimal power in this setting. That is, it is able to 
detect a very small perturbation of the empirical distribution, placed at a 
point where it produces a small nonmonotone bump in the hazard rate, and 
so small that even a likelihood ratio test (requiring knowledge of the shape 
of the bump) is barely able to detect the bump. 

Our calibration method is related to the "increasing bandwidth" approach 
first suggested by Silverman [20] in the case of density estimation, and used 
in a range of other settings since; see [6] for an application in the setting 
of monotone nonparametric regression. However, quite unlike those appli- 
cations, we increase the bandwidth only for the purpose of calibrating the 
test. Our test statistic does not involve any smoothing at all and is based 
directly on the standard empirical distribution function. 

Contributions to the problem of testing for a constant hazard rate against 
a monotone alternative include those of Bickel and Doksum [5], based, 
like the method of Proschan and Pyke [19], on normalized spacings; Bickel 
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[4], on the existence of asymptotically most powerful tests; Barlow and Dok- 
sum [3], on the more general problem of testing for convex orderings; and 
Ahmad [1], Gail and Gastwirth [11, 12] and Klefsjo [16], who proposed tests 
of the hypothesis of an exponential distribution. However, these approaches 
share the drawbacks noted above for exponential-based methods. A related 
difficulty arises in the context of testing for unimodality of a probability 
density by calibrating against the most difficult case of a uniform density; 
see, for example, [15]. Hall, Huang, Gifford and Gijbels [14] have suggested 
methods for estimating a hazard rate under the assumption of monotonicity 
and surveyed earlier work on the topic. 

Although our focus is on testing the null hypothesis of a monotone nonde- 
creasing hazard rate, the case where the null asserts a monotone nonincreas- 
ing rate is related. In the former case, the smoothed empirical hazard rate 
estimator is guaranteed to be monotone nondecreasing for all sufficiently 
large bandwidths, and this property is not available in the latter setting. 
The property makes it particularly easy to propose a bandwidth selection 
rule that ensures resampling from a distribution that satisfies the null; we 
may start with any conventional bandwidth selector, for example, based on 
a plug-in rule, and steadily increase the bandwidth until the smoothed em- 
pirical distribution has a monotone nondecreasing hazard rate in the region 
where the test is to be conducted. 

There is also a simple rule in the case where Hq stipulates that the haz- 
ard rate is nonincreasing: starting with any conventional bandwidth selector, 
increase the bandwidth until a monotone nonincreasing hazard rate is ob- 
tained; or, if that does not occur no matter how large the bandwidth, reject 
the null hypothesis at this point without passing to a further step. This rule 
is justified by the fact that, if the hazard rate is nonincreasing, then the 
probability that there exists a finite bandwidth (of larger order than the 
conventional n~^^^), such that the smoothed empirical hazard rate is non- 
increasing, generally converges to 1 as sample size increases. Nevertheless, 
in the remainder of the paper we shall address only the more practically 
important case where Hq asserts a nondecreasing hazard rate. 

2. Methodology. 

2.1. Test statistic. Suppose the random sample A" = {Xi, X„} is drawn 
from a distribution with distribution function F. The standard empirical 
distribution function is F{x) = J2iH-^i ^ ^)) where I{£) denotes the 
indicator function of an event £. The null hypothesis that F has monotone 
hazard rate on an interval 3 is equivalent to H = — log(l — F) being con- 
vex on J, and, hence, provided F is twice differentiable with a nonvanishing 
first derivative on J, to H" being nonnegative on J. The function H is the 
cumulative hazard rate. Its derivative is the hazard rate. 



4 



P. HALL AND L VAN KEILEGOM 



The empirical form H, H = — log(l — F), is not differ entiable, however. 
Therefore, it makes httle sense to test the nuh hypothesis by checking for 
nonnegativity of the second derivative of H. We could investigate methods 
based directly on smoothed forms of H, but this would not necessarily lead 
to tests that have good power properties; see Section 2.3. Instead we note 
that convexity of if on J is equivalent to nonnegativity of H(x + y) + H[x — 
y) — 2H{x) for all x and y such that both x + y and x — y are elements of J. 
It is not essential to take J to be an interval; it can be replaced by a disjoint 
union of intervals, for example. In the latter case it is, however, necessary to 
integrate in T [defined in (2.2)] over the pairs {x,y) belonging to J so that 
x + y and x — y lie in the same interval as x. 

Therefore, a test of the hypothesis of increasing hazard rate or, equiva- 
lently, of 

(2.1) Hq : H is convex on J, 

is to reject Hq in favor of its complement if the value of 

(2.2) T= jj inax{0,2H{x)-H{x + y)-H{x-y)Yw{x,y)dxdy 

x,y : x+y,x—y(i'J 

is "too large." The exponent r is an arbitrary positive number and w is a 
nonnegative weight function. By taking the maximum in the argument of 
the integral at (2.2), we have largely restricted attention to places where 
the sampled distribution has a decreasing hazard rate. (Here and below we 
use the words "increasing" and "decreasing" to mean "nondecreasing" and 
"nonincreasing," resp.) Further restriction will be made through our method 
for calibration, which uses the data to determine where the hazard rate is 
more likely to be increasing or decreasing. 

2.2. Calibration. Our approach to calibration will be based on bootstrap 
sampling from the distribution determined by a kernel density estimator, 

fix\h) = {nhr'Y.I< 
1=1 



h 



where K is a kernel and h a bandwidth. We shall choose X to be a smooth, 
symmetric density function, its graph being of conventional bell shape. Let 
F denote the distribution function corresponding to the density /, and let 
H = — log(l — F) be the associated cumulative hazard function. Then 

{l-F{x)}f'{x) + f{xf 



(2.3) H"{x) = -{d/dxY log{l - F{x)] 



{l-F{x)Y 



We shall write H"{x) as H"{x\h) when it is necessary to indicate dependence 
on bandwidth, and, as at (2.3), we shall drop the notation h from quantities 
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such as f{-\h) when it is not necessary for our argument. An empirical 
approach to bandwidth choice will be employed, as follows. 

Let h denote a conventional empirical bandwidth, the asymptotic size of 
which is We shall call h the "starting bandwidth." Examples include 

the bandwidths selected by the bootstrap, cross-validation or plug-in meth- 
ods. Steadily increase the bandwidth, starting from h, and stopping on the 
first occasion on which H" does not change sign on J. Define 

(2.4)/icrit = inf{/i > h: the equation H"{-\h) = has no solution on J}. 

We claim that if J is a compact interval, then for all sufficiently large h, 
H"{-\h) > on J, and so the set at (2.4) is not empty. Therefore, hcrit is well 
defined. 

To verify the claim, assume K has two continuous derivatives in a neigh- 
borhood of the origin, K{<d) > and K'{0) = 0, and observe that as h ^ 
oo, f{x\h) = K{0) + Op{h~^) and f'{x\h) = K"{0) n'^ J2,ix - Xi) + 
Op{h'~^), where both relations hold uniformly in x € J. It follows that, for all 
sufficiently large h, /(x)^ > \ f'{x)\ for all x G J. The claim that H"{-\h) > 
on J now follows from (2.3). 

Having computed /icrit , we repeatedly create samples of size n by sampling 
randomly, with replacement, from the distribution with density f{-\hcrit), 
and thereby repeatedly compute bootstrap values, T* say, of the statistic T. 
Arguing thus, and given a nominal probability of rejection, a say, for the 
test, we may compute a critical point c(a) defined by 

P{T* >c{a)\X} = a. 

The test takes the form: reject the null hypothesis if T > c(a). 

2.3. The road not taken: tests based on H" . In the test described in 
Sections 2.2 and 2.3 we have used smoothing methods only for calibration, 
not to construct the test statistic itself. An alternative approach would be 
to base a test directly on the property that, when H is twice continuously 
differentiable, the null hypothesis is satisfied if and only if H" > on J. In 
particular, we could construct a smoothed version, H say, of H with the 
property that H" is a consistent estimator of H" , and reject Hq if (e.g.) 
S = jj{max{0,-H")y is "too large." 

This approach has drawbacks, however. First, it requires a bandwidth to 
be chosen when constructing the test statistic 5; a second bandwidth would 
be needed when calibrating the test, if calibration were to involve sampling 
from a smoothed distribution. Second, the power of the test will depend 
intimately on choice of the first bandwidth. Indeed, the minimum distance 
from the null hypothesis at which local alternative distributions can be de- 
tected by the test will generally be proportional to n^^^'^h'^^, where h is 
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the bandwidth employed when constructmg S, and c > depends on the 
smoothing method used. Examples of this behavior in more conventional 
testing problems may be found in the work of Anderson, Hall and Titter- 
ington [2], Lavergne and Vuong [18] and Delecroix, Hall and Roget [10]. 

3. Theoretical properties. 

3.1. Summary of properties. Section 3.2 shows that, if H is in the class 
Hqi of hazard rates for which H" is bounded above zero on 3 [see (3.1)], 
then the statistic T is of size and asymptotically normally distributed. 
The bootstrap accurately captures this distribution. As the convexity of H 
becomes more marginal, the stochastic fluctuations of T increase. Thus, if 
H is in the class Hq2 [see (3.6)] of hazard rates for which H" vanishes at 
just a finite number of discrete points in 3, then the size of T increases to 
0{n~^^'^), and its distribution becomes nonnormal (see Section 3.3). The 
size of T increases still further, to Op(n~^/^), if H" vanishes on an interval, 
and, in particular, if F is an exponential distribution. (See Section 3.7, and 
see the third paragraph of Section 1 for an intuitive account of difficulties 
experienced calibrating against the exponential distribution.) Properties of 
our calibration method, when H is in H02, are treated in Section 3.4, where 
it is shown that the asymptotic probability of rejection is bounded away 
from zero. (Section 4 reports numerical properties in this case.) By way of 
contrast, if calibration is made against the exponential distribution then, 
when H is in Hqi or Hq2, the rejection probability converges to zero (Sec- 
tion 3.7), implying that this approach gives ultra conservatism. Optimality 
of our approach for identifying small, nonmonotone "wiggles" in the haz- 
ard rate is proved in Section 3.5. The ability of our calibration method to 
identify a fixed departure from the null hypothesis is shown in Section 3.6. 

3.2. Strict monotonicity of hazard rate. Throughout Section 3 we shall 
define the statistic T by taking r = 1 and tti = 1 in the definition at (2.2). 
Let Hqi be the following subset of the class of cumulative hazard functions 
for which Hq, defined at (2.1), holds: 

(3.1)i7oi = {H -.H" has two continuous derivatives on 3 and H" > on 3}. 

(We would mention that neither Hqi nor Hq2, the latter introduced in Sec- 
tion 3.3, is closed.) Put g = f^/'^/{l - F), 

^i=- f dx r E[mm{0,y'^ H"{x) + g{x){2\y\y/^N}]dy > 0, 

r roo roo roo 

a^= dx / / coY{mm{0,ylH"{x)+g{x)W{yi)}, 

J3 J ~00 J ~~00 J ~QO 

mm[0,yiH"{x) 

+ g{x){W{y2 + ys) - W{y3)}]) dyi dy2 dy^, 



HAZARD RATE TEST 



7 



where the random variable N has the standard normal distribution and W 
denotes a standard Brownian motion. It is clear that n is finite; our proof 
of Theorem 3.1 will show that cr^ is also well defined and finite. 

Theorem 3.1. Assume the distribution function F has three continuous 
derivatives on an open interval 3' which contains the compact bounded J, and 
that the density f = F' > on 3. If H G Hqi, then T = n~^fi + n^'^^^aNn, 
where Nn is asymptotically normally distributed with zero mean and unit 
variance. 

A version of the theorem continues to hold if the distribution function 
F = Fn is allowed to depend on n. The main requirements in this case are 
that the regularity conditions hold in a contiguous way, and Fn converge 
sufficiently fast to a proper limiting distribution, G say. In particular, Fn 
and G (the former for all sufficiently large n) should satisfy the conditions 

of the theorem, and, for j = 0, 1 and 2, Fn^ — G^^^ should converge to 
at a faster rate than n~^/^, uniformly on 3' . Under these assumptions, the 
limiting distribution of T is that defined when, in the definitions of /i and 
a, F is replaced by G. The proof requires only minor modifications. 

This result may be used to prove that if H ^ FIqi, and under mild condi- 
tions on h and K, the bootstrap estimator of the distribution of T is strongly 
consistent for the limiting distribution of T. Our next theorem will state this 
result. To formulate it, put 'H{£,i,^2) = [n~^^ ,n~^^], where 

(3.2) ^<6<6<i. 
Assume that 

fn n\ K is a symmetric, compactly supported probability 

■ ' density with a Holder-continuous derivative. 

Note particularly that bandwidths of size n~^/^ are in 7^(^i,^2) if (3-2) 
holds. Indeed, conventional bandwidth selectors, for example, those based 
on bootstrap methods, cross-validation or plug-in rules, satisfy 

(3.4) ^ ' ' ' 

as n — > oo, for some < Ci < C2 < 00. 

Let r* denote the version of T, defined at (2.2), but with r = 1 and 
w = 1, and computed from a sample drawn by sampling randomly from the 
distribution F conditional on X. Let fi and a be as in Theorem 3.1, and 
write $ for the standard normal distribution function. 



Theorem 3.2. Assume that the {possibly random) bandwidth h lies in 
T-i[^\,^2), where and ^2 satisfy (3.2), and that K satisfies (3.3). Suppose 
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too that F has four bounded derivatives on an open interval 3' which contains 
the compact interval J, that / > on J and that H G Hqi. Then, uniformly 
in X and with probability 1, 

(3.5) P{n^/^(r* - fi)/a < x\X} ^(x) 

as oo. 

Since the conclusion of Theorem 3.1 may be stated equivalently as 

then (3.5) may be interpreted as implying that the bootstrap distribution 
of T* converges to the limiting distribution of T, provided H € Hqi. 

It should be mentioned too that if a starting bandwidth h is chosen using 
a standard method such as the bootstrap, cross-validation or plug-in, and 
if the method suggested in Section 2.2 is employed to calculate the critical 
bandwidth /icrit) then, under the conditions imposed on F and K in Theo- 
rem 3.2, it is true with probability 1 that h = /icrit for all sufficiently large n. 
That is to say, the iterative process used to define /icrit stops at the very 
first step. This is a consequence of two properties: (i) if € Hqi, then H" 
must, in fact, be bounded above zero on the compact interval J; and (ii) if 
a bandwidth of conventional size is used, then H" converges uniformly to 
H" on J with probability 1. Together (i) and (ii) imply that with probabil- 
ity 1 H" is bounded above zero for all sufficiently large n, and, hence, that 
h = /icrit for all sufficiently large n. 

Furthermore, with probability 1 /i € "^(^1,^2) for all sufficiently large n. 
Therefore, when H G Hqi the calibration step in Section 2.2 degenerates 
in asymptotic terms to simply using the standard bandwidth selector, in 
which case its properties are covered by Theorem 3.2. In particular, using 
a standard bandwidth selector leads to consistent estimation of the limiting 
distribution of T when H € Hqi . 

3.3. Strict monotonicity at all but a finite number of points. Let Hq2 be 
the following subset of the class of cumulative hazard functions satisfying Hq : 

Hq2 = {H : H" has two continuous derivatives on J, and H" > on J, 
except for a finite number of distinct points xi, . . . , Xm £ J, 

where H" vanishes and //^^^ > 0}. 

(3.6) 

We assume m > 1. Note that it is not possible for H" to vanish at a point x, 
for H^^^ to be strictly negative there, and at the same time for the hazard 
rate to be strictly increasing on sufficiently small intervals containing x. 

The case of strict monotonicity at all but a finite number of points may 
fairly be interpreted as the boundary between cases where H G Hqi and 
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those where the hazard rate has decreasing parts in the vicinities of points 
xi, . . . ,Xn- The assumption that H"{xi) = and H^^\xi) > impHes that 
the hazard rate has a "shoulder" at and is on the verge of decreasing there. 
Therefore, testing in this context means attempting to identify alternative 
hypotheses in difficult cases; compare [7]. It offers the opportunity to assess 
performance against local alternative hypotheses, an opportunity we shall 
take up in Section 3.5. The opportunity is virtually absent in the setting of 
Section 3.2. 

Let Zi, . . . , Zm be independent random variables, Zi having the distribu- 
tion of 

/oo roo 
/ min{0, ilx^ + ^y^)H^^^ {x^) + g{xi) W{x + y)} dx dy, 
-oo J ~oo 

where W denotes a standard Brownian motion. For simplicity, we shall as- 
sume that 

(3.8) no Xi is an endpoint of J. 

Theorem 3.3 has an analogue in the contrary case; it involves altering the 
distribution of Zi when Xi is an endpoint. 

Theorem 3.3. Assume F has four continuous derivatives on an open 
interval which contains the compact interval 3, and that f = F' > on 3. 
Suppose too that H € Hq2 for points xi, . . . , Xm in the definition of that func- 
tion class, and that (3.8) holds. Then we may write T = n~^/'^ J2i<i<m ^m, 
where the joint distribution of {Zni , ■ ■ ■ , Znm) converges to that of {Zi , . . . , Zm)- 

Again, a version of the theorem holds when F = F^ varies with n. How- 
ever, a direct analogue of Theorem 3.2 does not exist in this setting. Essen- 
tially, this^is because a bandwidth that is sufficiently large to ensure conver- 
gence of H^"^^ to H^^\ and so capture the role of H^^\xi) in the definition of 
the distribution of Zi, is too large to allow sufficiently fast convergence for 
capturing other features of the limiting distribution. Thus, in the "bound- 
ary" case treated by Theorem 3.2, there is not a direct way, based on the 
estimator F and using a bandwidth that is asymptotic to a nonrandom 
quantity, of calibrating the test so as to capture the exact distribution of T. 

Details behind this claim will be given in Section 5.4. These difficulties 
persist even if F is computed using a high-order kernel. 

One way of overcoming these difficulties would b^ to locally model the 
behavior of F in the neighborhood of points x where H"[x) was small, rather 
than leaving estimation there up to the generic estimator F and to use the 
model directly to estimate the distributions of Zi, . . . , Z„. This approach is 
rather cumbersome, however, and so, for simplicity we shall not consider 
it further. Moreover, the problems are largely overcome by the calibration 
method proposed in Section 2.2, the theory of which we treat next. 
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3.4. Calibration based on /icrit- The calibration method suggested in Sec- 
tion 2.2 produces a test for which the rejection probabihty, for H e Hq2, 
converges to a number that hes strictly between and 1, and so suffers less 
from the difficulties noted above. First we describe limiting behavior of the 
critical bandwidth, /icrit; in the case H G Hq2- For simplicity we assume there 
is only a single point, xi, at which H" vanishes. 

Define c = ^ J v?K{u)du and 

%,X,y) = (c(?2 + ix2 + iy2)^(4)(:,,) 

(3 9^ +(r'^g{xi) j K"{u)du 

X / {W{x + ty - qu) + W{x-ty - qu)}{l-t)dt, 
Jo 

where g = - F) and W is a standard Brownian motion. Let Q > 

denote the infimum of values q> such that S{q,x,y) > for all real x,y. 



Theorem 3.4. Assume the conditions of Theorem 3.3, but with m = l. 
Suppose too that K is a symmetric, compactly supported probability den- 
sity with two Holder- continuous derivatives, and that the starting bandwidth 
h used to initiate the algorithm that produces hait satisfies (3.4). Then 
n^^'^hcrit ^ Q in distribution as n ^ oo. 



Next we describe the asymptotic rejection probability for the test when 
H G Hq2- For < a < 1, define Za to be the a-level quantile of the dis- 
tribution defined at (3.7) in the case i = 1. Noting (3.7), we see that we 
may write Za as a continuous function of H'^^^xi) and ^(xi), say Za = 

r,{FW(xi),<7(xi)}. Put 

S{x) = S{Q,x,0) 

(3-10) = (cQ2 + ^x^)h'^^\xi) + Q-^g{xi) r K"{u)W{x - Qu) du. 

J —oo 

It follows from the definition of Q that, with probability 1, (a) S{x) > 
for — oo < X < cx), (b) there exists a unique (random) point x = A at which 
S{x) = 0, and (c) S'{A) = and S"{A) > 0. [To appreciate why, observe 
that S is asymptotically proportional to H"{xi -\- n~^^'^ x), after taking the 
bandwidth to equal n~^^'^Q. Note that the second derivative of S is well 
defined and continuous as long as K has three continuous derivatives.] 

Let Zi denote the random variable at (3.7) when i = 1, constructed us- 
ing the same Brownian motion W as at (3.10). Therefore, Zi and S"{A) 
are linked through W. In interpreting the theorem below, note that the 
probability that Zi < Ta{H^^\xi), g{xi)} equals a. 
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Theorem 3.5. Assume the conditions of Theorem 3.4, but with the ad- 
ditional requirement that K have three continuous derivatives. Take h = 
/icrit • Then the rejection probability for the bootstrap test converges as oo 
to the probability that Z\ < T a{S" (A) , g{xi)} . 

3.5. Power against local alternatives and optimality. Let F denote a 
four-times continuously-differentiable distribution function for which the 
corresponding hazard rate is in Hq2 ■ Assume for simphcity that there is only 
one point at which, for this F, H" vanishes on J. Let this point be xi = 0, 
and take it to be an interior point of J. Since H € H02, then H^^\0) = and 
H^^\0)>0. 

We shall add a "wiggle" to F in the vicinity of the origin, such that the 
perturbed distribution violates the null hypothesis. The perturbation will 
be chosen so that it is only barely detectable using an optimal parametric 
method, that is, the likelihood-ratio test. We shall then explore the perfor- 
mance of our nonparametric test, based on the statistic T, and show that it 
too is able to detect the wiggle. 

The perturbation, ae^ ^'(x/e), is based on a four-times continuously-differ- 
entiable function ^ supported on [—1, 1]. The constant a > represents the 
height of the wiggle, and e = e(n) — indicates the extent of the perturba- 
tion away from its center, at the origin. We shall choose e so small that the 
perturbation is only barely detectable by the likelihood-ratio test. Our con- 
struction of the perturbation ensures that, like the distribution F to which 
it is added, it has four bounded derivatives near the origin. 

The perturbed distribution is 

(3.11) Fnix) = F{x) + a£'^^{x/e). 

(It is possible, for small n, that F„ will be decreasing in some region, but 
for the choice e = n~^^'^ that we shall make, and under the other regularity 
conditions of Theorem 3.6, F^ will be nondecreasing on J for all sufficiently 
large n.) Let Hn denote the cumulative hazard rate corresponding to Fn- If 
we choose ^' so that ^'(x) = —x^ in a neighborhood of the origin, then, for 
each a > and all sufficiently large n, is strictly monotone decreasing 
in a neighborhood of 0. [This neighborhood is of width 0{e).] Therefore, Fn 
fails to satisfy the null hypothesis of an increasing hazard rate. 

The density /„ = satisfies fnix) = f{x) + ae^ip{x/e), where ip = 
Since must be a density, then J = 0. Now, 

log{/„(x)//(.)} = "-^^ - ^^^1^ + 

Therefore, putting 6(a) = ^a^/(0)~^/ -0^, /+ = fn and /_ = /, we have, 
taking the ± signs, respectively, 

(3.12) J /±(x)log{/„(x)//(x)}dx = ±6(a)eVo(e7). 
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It follows from (3.12) that the expected log-likelihood ratio, for a sample 
of size n, is of size ne^ . Choosing e such that this quantity is bounded away 
from zero and infinity, in particular, e = n"^/'', makes the perturbation only 
barely detectable. In that likelihood-ratio test for discriminating 

between / and does not have asymptotically perfect accuracy. 

Our test is able to detect local alternatives such as F„, provided the 
function 7r2a(o) for our test satisfies 

(3.13) lim 7r2a(a) = 1. 

a — >oo 

If (3.13) holds, then our test shares the optimal performance of the likelihood- 
ratio test. 

To establish (3.13), note first that, for j = 0, . . . ,4, 

gy=g" + °^7-'nlf' +o{.»-^/(i.i<e)}. 

uniformly in x. In particular, the second derivative of Hn — is of size 
and the fourth derivative is asymptotic to a^^^\x/£)/{l — F{x)}. Using 
these properties, and the arguments leading to Theorem 3.5, the following 
result may be proved. It verifies (3.13) in the case where the test in question 
is the bootstrap-calibrated one proposed in Section 2. 

Theorem 3.6. Assume the hazard rate of the four-times continuously- 
differentiahle distribution F lies in H{)2, with m = 1 and xi = 0; and that 
Fn is given by (3.11), where the four-times continuously- differentiable func- 
tion \I' is supported on [—1, 1] and satisfies ^'(x) = —x^ in a neighborhood of 
the origin. Suppose too that e in (3.11) is that h = hcrit, and that the 

starting bandwidth h satisfies (3.4). Let po,{a,n) denote the probability that 
the bootstrap- calibrated test of the null hypothesis of monotone hazard rate 
rejects the null hypothesis when applied to data from F^. Then (a) pa{a,n) 
converges to a limit, 7r2a{a) say, as n— >oo, and (b) 7r2a{cL) satisfies (3.13) 
as oo. 

3.6. Rejection probability under the null hypothesis, and power against 
fixed alternatives. The result below shows that the bootstrap-calibrated 
form of our test is asymptotically consistent in rejecting the null hypothesis 
whenever it is violated by a fixed alternative. 

Theorem 3.7. Assume F has two continuous derivatives on an open 
interval J' which contains the compact interval J, that / > on d, but that 
the hazard rate for F is strictly decreasing in a subinterval of 3. Suppose too 
that K satisfies (3.3), that K{fS) ^ 0, that E\X\ < oo and that the starting 
bandwidth h for the algorithm leading to hcrit defined in Section 2.2 satis- 
fies (3.4). Then P{T > c{a)} 1, as n—> oo, for each < a < 1, where c(q) 
is the bootstrap critical point introduced in Section 2.2. 
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3.7. Calibration against the exponential distribution. Put A{x) = B{F{x)}/{l — 
F{x)}, where B is a standard Brownian bridge, and define 




max{0, A{x + y) + A{x - y) - 2A{x)] dx dy. 



In this notation, and using standard Gaussian approximations to the em- 
pirical distribution F (see, e.g., [17]), it can be proved that if F is taken to 
be exponential over d, then v}/'^T Tq in distribution. This result follows 
from the fact that, in the exponential case, the cumulative hazard rate is 
linear. In particular, in that setting H is in neither Hqi nor Hq2- 

Therefore, if we calibrate T by reference to an exponential distribution, 
then the critical points for the test will be distant from the origin. 

However, if H is in either Hqi or Hq2, this is much further from zero than 
the actual critical points of the distribution of T. Indeed, we know from 
Theorems 3.1 and 3.3 that those points are distant only 0{n~^) from zero 
when H G Hqi, and only distant 0(n~^/^) when H S Hq2- (The same is true 
of the bootstrap critical-point estimator suggested in Section 2.2.) It follows 
that, for each value of the nominal rejection probability of an exponentially 
calibrated test, the exact rejection probability (for H in either Hqi or Hq2) 
will converge to as n ^ cxo. 

Put another way, the exponentially calibrated test will become ultra- 
conservative as sample size increases. In particular, it will fail, asymptot- 
ically, to detect the perturbation-type null hypothesis discussed in Sec- 
tion 3.5. In order for detection to be even barely possible in that setting, 
the perturbation e^^'(x/e) (with e = n~^/'^) would have to be increased by 
the factor n^/^^. 

4. Simulations. Simulations are carried out for two models. First, con- 
sider a variable X with hazard rate 

(4.1) H'{x) = a{{x-bf + h^} + c + dx^, 

where x,a,6, c > and d is chosen such that H'{x) > for all x > 0. The 
distribution function corresponding to this hazard function is given by 

F{x) = 1 - exp[-a{i {x - bf + b^x} - cx - \ dx\ 

It is readily verified that H € Hqi when d > 0, H ^ Hq2 when (i = and H 
is in neither Hqi nor Hq2 when d<Q. Figure 1 shows the graph of H'{x) for 
certain values of the parameters. 

The simulations are based on 2000 samples of size n = 50 and, for each 
simulated sample, 2000 resamples are generated. The interval J on which the 
test statistic T is based is given by [0,F-i(0.95)]. The starting bandwidth h 
is determined from the normal reference rule for plug-in estimation, that is. 
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h = 1.06n"^/^(T, where a is the estimated standard error of X . The kernel 
function used is the normal kernel. The results for a = 2.5, b = 0.75, c = 0.50, 
for several values of d and for a = 0.10 are presented in Figure 2. The 
power curve starts at —1.14, which is the smallest possible value of d for 
this choice of parameters. The results for other choices of the parameters 
and for a = 0.05 are similar. For most choices slightly conservative rejection 
probabilities are observed. As a comparison we also implemented the global 
sign test of Proschan and Pyke [19] and the local sign test of Gijbels and 
Heckman [13]. From Figure 2 it is clear that the power curves of both tests 
are considerably below the curve of the new test. The power of the global 
test is even identical to zero for all values of d. This confirms what was 
explained in Sections 1 and 3.7 about the lack of power of tests based on 
calibration with respect to the exponential distribution. 

Next, we consider hazard rates which contain a small "bump" and we 
study how well the three tests are able to detect this little perturbation 
from Hq. The hazard rate considered is 

(4.2) H'{x)=exp[-flogx + (3{2TTa'^)-^^'^exp{-{x-fif/{2a^)}], 

where x, ex, /x > and 7 and /5 are real numbers. This model is also considered 
in [13]. Graphs of this hazard rate for different values of the parameters are 

r-- I 1 1 ■ 1 1 ■ 1 1 ■ ; 

im - / - 

- -■ .-' - 




O I ' ' 1 1 ' 1 ' 1 ' 1 ' 1 ' 1 1 

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 

X 



Fig. 1. Graph of H'{x) for model (4.1) when a = 2.5, b = 0.75, c = 0.5 and 
d = —1, —0.75, —0.50 and —0.25 {dashed curves), d = {full curve) and d = 0.5,1 and 
1.5 {dotted curves). 
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presented in Figure 3. It is clear that, for (3 sufficiently large, the hazard rate 
contains a "bump" at x = ^. The simulation results are obtained from 1000 
samples of size n = 50 and for the bootstrap procedure 1000 resamples are 
used. The results are shown in Table 1. Clearly, the hypothesis Hq is only 
satisfied when 7 = 0, 0.50 or 1 and /3 = 0. In comparison with the local sign 
test of Gijbels and Heckman [13] and the global sign test of Proschan and 
Pyke [19], the new testing procedure is now leading to rejection probabilities 
that are most of the time higher, but not always. Also note that the new 
test tends to be anticonservative, while the global and local test are, on 
the contrary, quite conservative. This has to be taken into account when 
comparing the powers of the three curves. 

5. Technical arguments. 

5.1. Proof of Theorem 3.1. Define Aqf = F — F, and observe that 
(5.1) H = H + -^ + Op{n-^), AoF = n-^/^B{F) + Op{n~hogn), 




J I I L 



-0.9 -0.6 -0.3 0.0 0.2 0.6 0,9 1.2 1.5 

d 



Fig. 2. Rejection probability for model (4.1), when a = 2.5,6 = 0.75, c = 0.5, and for a 
range of values for d. The full curve is obtained with the new test, the dotted curve with 
the local test of Gijbels and Heckman [13], while the dashed curve represents the nominal 
level Of = 0.10. The global test of Proschan and Pyke [19] has everywhere zero power. 
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where the first resuh holds uniformly on J, the second uniformly on the real 
line and B denotes a Brownian bridge, the construction of which depends 
on the data. The first identity at (5.1) follows by simple Taylor expansion, 
while the second uses results of Komlos, Major and Tusnady [17]. Together 
the identities imply that 

(5.2) H = H + n^^/^^^ + Op{n~^\ogn), 

L — r 

uniformly on J. 

Assume H € -ffoi; and, given a function ip{x) defined for x G J, put ^p{x, y) = 
tplx + y) + ipix — y) — 2V^(x) whenever x + y,x — y & J. Now H{x,y) = 
y'^H"{x + ey), where -l<e = e{x,y) < 1. Hence, for i^Gi^oi, 

inf y~'^H{x,y) > 0. 

x,y : x+y,x—y£j 




Fig. 3. Graph of H'{x) for model (4.2) when fJ. — 1 and /9 = (dashed curve), /3 = 0.3 
and a — 0.1 {full curve) and /3 — 0.3 and a = 0.2 {dotted curve). For the figure on the left 
7 = —0.5, for the one on the right 7 = 0.5. 
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Table 1 

Rejection probability for model (4.2) and for a = 0.10. The numbers in 
italic are rejection probabilities under the null hypothesis 



1 



Parameter 


Test 


-0.50 


-0.25 





0.50 


1 


/3 = 


New 


0.833 


0.643 


0437 


0.189 


0.121 




Global 


1.00 


0.800 


0.100 


0.000 


0.000 




Local 


0.983 


0.416 


0.100 


0.034 


0.027 


/3 = 0.3 


New 


0.675 


0.753 


0.772 


0.656 


0.508 


(7 = 0.1 


Global 


0.997 


0.458 


0.019 


0.000 


0.000 


M=l 


Local 


0.962 


0.291 


0.178 


0.176 


0.154 


/3 = 0.3 


New 


0.715 


0.714 


0.663 


0.443 


0.277 


(7 = 0.2 


Global 


0.999 


0.588 


0.035 


0.000 


0.000 


M=l 


Local 


0.968 


0.301 


0.114 


0.065 


0.054 



We may therefore deduce from (5.2) that, for some constant Ci > 0, —H(x,y) > 
only if 

Cm^/V < max{|S{F(x + y)} - B{F{x)}\, \B{F{x - y)} - B{F{x)}\} 

where the random variable An does not depend on x or y and equals Op(l) 
as n ^ 00. 

For each x, let Y{x) denote the supremum of values y such that x + y,x — 
y € J and (5.3) holds. Then for each x, Y{x) = Op{n~^^^). Since 

(5.4) \B{t + u)- B{t)\ = Opi\u\ogu\^/^) 
uniformly in t, u such that <t,t + u < 1, then 

(5.5) supy(x) = Op{(n~Mogn)^/3}. 

3 

Defining Ai^ = B{F) and A2F = B{F)/{1- F),we deduce first by Taylor 
expansion and then application of (5.4) that 



l-F{x) V 1-F{x), 
+ 1-Fix) V l-Fix)) l-F(x) ^^^^^^"'^ ^ 
1 — F[x) 

uniformly in x € J and \y\ < Y{x). Therefore, 

f ry(^) 

T= dx max{0, —H{x,y)} dy 

J'J J-Y{x) 
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+ Op{(n-Mogn)4/3} 



(5.7) =-1(1x1 mm{0,i7(x,y) +n"^/2A2F(2;,y)}dy 

J-Y(x) 



(5.^ 



I- /•Y(x) r ^ 

I dx I m.m.\Q,H{x,y) + n~^/'^- 



+ Op{(n"ilogn)4/3}, 



where the second identity follows from (5.2) and (5.5), and the third comes 
from (5.5) and (5.6). 

Let W denote the standard Brownian motion through which B may be 
expressed as B{t) = W{t) - tW{l) for < t < 1. Put A^p = W{F). Observe 
that Aii?(x,y) — AsF{x,y) = Op{Y{x)'^} uniformly in x € J and |y| < Y{x). 
Therefore, (5.5) and (5.8) imply that 

T = -fdx mJo,H{x,y)+n-y'^^^]dy 

(5.9) Jj J-Y{x) I 1 - F{x) J 

+ Op{{n-hogn)^/'}. 

Since H'" is bounded, then H{x,y) = y'^ H"{x) + 0(|yp) as y ^ 0, uni- 
formly in X G J. From this result, (5.5) and (5.9), we deduce that 

(5.10) T = Ti + Op{{n-Hogn)^/'^}, 
where 

"^''''^ ™;„/n „.2r^///^^ , ^-l/2'^3F(x,y) 



-Ti = dx mm< 0,y H [x)+n ' — — } 

(5 1^) J~Y(x) I 1-F{x)} 



n 



~'fdxf min(0,.^if-(x)+nV'^ ^3Hx,n-V^.) |^^ 



dy 

1/3, 



and Jn(3;) denotes the set of y such that both x + n ^^^y and x — n ^/^y lie 
in J. 
Put 

W^x(y) = ni/*^[W^{F(x) + n-^/^yf{x)] - W{F{x)]]/ f{xf'\ 

which, like W , is a standard Brownian motion. It may be proved from (5.11) 
that 



-nE{Ti) 

= j^dx J°° El^mm(^0,z'^H"{x) 



1 - F{x) 
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W,S^z + ^n-'/h''f\x)fixr' 
+ W^I^-z+^n-^/h^f'{x)f{x) 



dz 



+ o(n-i/6). 



Prom this result and the fact that, for < |m| < \z\, Wx{z + u) + Wx{—z + u) 
has the normal N(0,2|z|) distribution, we deduce that 

(5.12) nS(ri)=^ + o(n-i/6), 

where /i is as defined in Section 3. 

To derive a central limit theorem for Ti, we first approximate Ti by a 
sum of 3-dependent random variables, as follows. Define = logn and 6 = 
6{n) = A„(n-Mogn)^/3. Put 



1/2 ^3F{x,y) 



dy, 



-T2= [ dx [ min(o,y2/7"(x) + n-i/2^ ^, 
-Ui) =1 dxf min(o,/g--(x) + n-V2 ^3F(^;^n 



compare these definitions with the first identity at (5.11). Then T2 = X^j ^2(^)- 
Note that, since Brownian motion has independent increments, T2{i) is 
stochastically independent of T2{j) for \i — j\ > 3. 

In view of (5.5), the probability that max3|y(rE)| < 6 converges to 1 as 
n 00. Note too that max^.^^ |y(x)| < 5 implies Ti = T2. Hence, if we 
prove that the following three results are true: (a) var r2 ~ var Ti ~ (j^n"^/^, 
(b) (T2 — ii^T2)/(varT2)^/^ has an asymptotic standard normal distribution, 
and (c) 'n7/^{ETi - ET2) 0; then it wih follow that n'^/^{Ti - ETi)/a has 
an asymptotic standard normal distribution. Theorem 3.1 is a consequence 
of this property and (5.12). 

Result (c) may be proved using the argument leading to (5.12), and the 
first asymptotic relation in (a) may be derived using the method giving the 
second. Therefore, it suffices to show that (b) holds and that (d) varTi ~ 

To prove (b), let C > and define T^{i) = n'^ /^5~^l'^T2{i), T^) = T^{i) x 
/{|T3(i)| < C}, T5(i) = TS) - Tiii) and Tj = T.^TJ{i) for j = 4,5. Por all 
sufficiently large C, the variance of T4, and the number of nondegenerate 
summands T^^i), are both asymptotic to constant multiples of 6~^; and the 
summands are uniformly bounded. Therefore, using a central limit theo- 
rem for uniformly bounded m-dependent random variables (see, e.g.. Theo- 
rem 7.3.1, page 214 of [9]), we may prove that (T4 — £'r4)/(varT4)^/^ has an 
asymptotic standard normal distribution; call this result (e). The argument 
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that we shall use to prove (d) may be employed to show that as C — > oo, 
(f) lim 

n^co ^ X var T4 — > (T^ and (g) lim.„^oo ^ var T5 — > 0. Combining (e)— (g) , 
we deduce that {T3 — ET3) / {vai T^y^'^ has an asymptotic normal distribu- 
tion. This is equivalent to (b). 

It remains to derive (d). Recall that g = /^^^/(l — F), and define 

U.{y) = n'/^[W{F{x + n~'/''y)} - W{F{x)}]/ f{x)'/^ , 

Wi{xi,yi) =mm{0,yjH"{xi) + g{xi)V,,{yi)}, 
W2ixi,x,y2) =nim{0,ylH"{xi + n-^/'^x) + g{xi+n-^^^x)V^^^^-i/3^{y2)} 

and J^n{xi) = {x : xi -\- n~^^^x € J}. In this notation, 



n^varri= / dxi / dx2 / dyi 

cov[min{0,y?i/"(xi)+5(xi)V;,(yi)}, 

3n(x2) 

2 TTll 



min{0,y2^^ {X2) + g{x2)Vx2{y2)]] dyi dy2 



■ n '^/^ / dxi / dx dyi 



cov{Wi{xi,yi),W2{xi,x,y2)}dy2, 

a„(xi+n-i/3a;) 

where "Jni-c) is as defined below (5.11). In view of the independent increments 
of Brownian motion, the random variables Vxj^{yi) and V^_^_^_^-l/3x{y2) are 
independent if \yi\ + \y2\ < |x|. In this case, the covariance in the second 
identity above vanishes. Therefore, 

n"^/^ var Ti 

= / dxi / dx 

(5.13) Jjnixi) 

X J J cov{Wi{xi,yi),W2{xi,x,y2)}dyidy2, 

yi,y2 ■■ |»/i| + |»/2|>|a;|;C(xi,x) 

where C(xi,x) denotes the constraint that yi S Jn(xi) and y2 € 3n{xi + 

The random variables |Vri(xi,yi)| and \W2{xi,x,y2)\ are respectively 
bounded by Ci|iVi|J(|7Vi| > Cay?) and C7i|iV2|/(|iV2| > C22/I), where Ni and 
N2 are standard normal random variables, and Ci and C2 are positive con- 
stants not depending on xi, x, yi or 7/2, although the correlation between 
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A^i and N2 does depend on these quantities. We may therefore deduce that, 
for constants C3, C4 > 0, 

I cov{Wi (xi , yi) , W2 (xi , 2/2) } I 

(5.14) < CiE{\Ni\^I{\Ni\ > C2yl)Y'^E{\N2\^I{\N2\ > C2yl)Y'^ 
<C^eM-C^{y\ + yt)}. 

Therefore, \ cav{Wi{xi,yi),W2{xi,x,y2)}\ is bounded above by a function 
which does not depend on n and whose integral over — 00 < a; < 00 and over 
all real 1/1,2/2 that satisfy \yi \ + |y2| > \x\ is bounded uniformly in xi € J. 
Furthermore, if 1/ is a standard Brownian motion, then 

co\{Wi{xi,yi),W2{xi,x,y2)} 

cov(min{0, yfH"{xi) + g{xi)V{yi)}, 

mm[0,ylH"{x^)+g{x^){V{x + ya) - V{x)}]), 

uniformly in xi G J and x, yi and y2 in any compact set. We may therefore 
deduce from (5.13) and the dominated convergence theorem that varTi ~ 
^2^-7/3^ which is the desired result (d). Note that (5.14) also implies the 
finiteness of cr^. 

5.2. Proof of Theorem 3.2. Put uq = and uj = 2 j — 1 for j > 1. Observe 
that, for j = 0, 1, 2 and each 1] > 0, 

(5.15) f(^)(x) - F(J)(x) = Opiinh"^)''-^^/^^ + /i^}, 

uniformly in /i G '^(^11^2) and x G J'. (The assumption that F has four 
bounded derivatives is needed to derive the Op{h?') remainder term in (5.15) 
when j = 2. The other part of the remainder at (5.15), which applies to the 
error of the left-hand side about its mean, may be obtained by applying the 
stochastic approximation of Komlos, Major and Tusnady [17].) It follows 
from this property and (2.3) that, with probability 1, H" converges to H" 
uniformly in /i G 7^(i^i, ^2) and x G J'. We may choose J' and e > such that 
H" > e on J'. In this case, and with probability 1, H" > on J' for all 
sufficiently large n. In particular, for all sufficiently large n, the hazard rate 
corresponding to F lies in Hqi. 

The argument leading to Theorem 3.1 may now be used to prove that 
(3.5) holds when F, rather than F, is the sampled distribution, provided fj, 
and a at (3.5) are replaced by the analogous functionals of F. Let these be 
]1 and cj, respectively, and denote by (R) the corresponding version of (3.5). 
By (5.15), 

(5.16) ^\ + \d-a\= Op{(n/i3)''-(i/2) ^ J^2^^ ^ Op{n~^''^), 

the second identity holding uniformly in /i G 'H{ii,^2) and following from (3.2) 
Property (3.5) follows from (5.16) and (R). 
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We should mention that the assumption in Theorem 3.1 that F have 
three derivatives is imposed for simphcity, and is a httle more stringent 
than necessary. At (5.10) we need only two derivatives and a Holder con- 
dition of order ^ + e on F" , in which case the Op term at (5.10) becomes 
Op{(n~^ logn)^^"'"^)/'^} = Op(n~''/^) (as required), the identity holding pro- 
vided e > 0. An empirical version of this argument can be developed pro- 
vided h G 'H{6.i,S,2) and ^1,^2 satisfy (3.2). 

5.3. Proof of Theorem 3.3. The assumption that the hazard rate is non- 
decreasing and that H"{xi) = implies that H"'[xi) = for 1 < i < m. To 
appreciate why, observe that 

H{x,y) = y^H"{x) + ^y^H^'^{x + By), 

where —1<6 = 9{x, y) < 1. Taking x = Xi + u, where |m| is small, and Taylor- 
expanding, we deduce that 

H{x, + u,y) = y^uH"'{x,) + {^uV + ^y^)H^'Hx + 0'{\u\ + \y\)}, 

where —1<6'< 1. If H"'{xi) / 0, then, taking \u\ = and choosing 

the sign of u such that uH"'{xi) < 0, we find that as y — > 0, H{xi + u,y) = 
— \y\'^^'^\H"' {xi)\ -|- o(|y|^/^). This implies that H is nonconvex near Xi, and 
so contradicts the assumption that the hazard rate is nondecreasing. 

Result (5.2) continues to hold in the setting of Theorem 3.3, and so 
by (5.7), 

(5.17) T = T2 + Op{{n-hognf/^}, 

where 

r /'^('^) 

T2 = - dx mm{0,y'^H"(x) + ^y'^H^'^\x + 9y)+n-^/^A2F{x,y)}dy 

J3 J-Y{x) 

and we redefine Y{x) to equal the supremum of values y such that x + y,x — 
y G J and 

y^H"{x) + ^y^H^^\x + Oy) + n~^/^A2F{x,y) + n-\logn)An < 0, 

where the random variable An = Op{l) does not depend on x or y. In de- 
riving (5.17), we have used the fact that, by employing arguments leading 
to (5.5), it may be proved that 

supy(x) = Op{(?i~Mogn)^/^}. 

More analogously to (5.5), it may be shown that if ?] > and J' = J{ri) is 
the subset of J all of whose points are distant at least r] from each Xj, then, 
using the new definition of Y[x)^ 

sup y(x) = Op{(n~^ logn)^/3}. 
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Using this result and the arguments leading to (5.9) and (5.10), we may show 
that, if T2(i7) denotes the contribution to T2 from the integral over x ^ J , 
rather than x G J, then T2{J) = T^{J) + Op(n~^), where 

T,{J) = - [ dx n^M,y^H''{x)+n^'l^^^^^\dy. 

J J J-Y(x) I 1 - F{X) J 

The methods leading to (5.12) give that E{T^{J')} = 0{n~^). Therefore, 

(5.18) T2{J) = 0p{n-'). 

Let r/ > be less than half the minimum of Xi-^i — Xi over < i < m, where 
xq denotes the lower limit of J and Xm+i is the upper limit. Write T2{xi,r]) 
for the contribution to T2 from the integral over Xi — r] < x < Xi + r]. Then 

du / min[0, (^u^y^ + -^y^) 

-T] J-Y{Xi+u) 

X FW{x, + 0,(|n| + \y\)} + n-^l^^2F{xuy)]dy, 

where — 1 < 0j < 1. Changing variables from {u, y) to {v^ z), where u = n"^!^ v 
and y = n~^/'^z, we deduce that 



roc roc 

r2(xi,r/) = -n-6/M dv mm{0,{^v'^z'^ + ^z^)H^^\xi) 



(5.19) 



2^/ ^ I 12' 
00 J —00 



+ g{xi)Wi{v + z)}dz 
+ Op(n-^/^), 

where Wi is a standard Brownian motion. The processes Wi, 1 < i < m, 
may be taken to be independent without violating (5.19). Theorem 3.3 now 
follows on combining (5.18) and (5.19). 

5.4. Reasons for failure of bootstrap version of Theorem 3.3. In order 
for H^^^ to consistently estimate H^^^ , it is necessary that the bandwidth h 
used to construct F be of larger order than For simplicity, we shall 

assume below that h is at least of size n^~^^/^^ for some > 0, although our 
argument may by pursued to an unaltered conclusion when the increase of 
h over n"^/'' is by only a logarithmic factor. 

Put c = i / u^K{u) du. Observe that, for each r/ > 0, F" = F" + c/i^F^^) + 
Op{(n/i^)''-(^/2)} + o(/j2)^ uniformly in x € J'. [Here we have used the fact 

that h > n«-(V7).] It follows that H" = D^A{F + ch^F") + Op{{nh^)'^~^^/^'^} + 
o{h'^), uniformly in x S J', where A{u) = — log(l — u) and D is the differential 
operator. Now, D'^A{F + ch'^F") = D^A{F) + ch^D^{F"A'{F)} + o{h^), and 
D^F"A'{F)} = D\D^A{F) - {F'fA"{F)} = D^H" - {H'f}. Therefore, 
D'^{F"A'{F)} = _ 2{H'H"' + {H"f]. Hence, 

H" = H" + c/i2[i?(4) - 2{H'H"' + {H"f}] 
^ ^ ^ +0,{{nh^f~^''^'^} + o{h% 
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uniformly in x G J'. 

The term of order {nh^)^~^^^'^^ on the right-hand side of (5.20) is, of 
course, the result of stochastic error, and performance would only improve 
if it could be dropped. Let us assume this can be done. Then we can estimate 
H"[x) with error equal to 

(5.21) ch:^[H^^\x)-2{H'{x)H"'{x)+H"{xf}]+o{h^). 

Now, the limiting distribution of T, when H G Hq2, is determined by prop- 
erties of H in arbitrarily small neighborhoods of the points Xj, and so it is 
there that we are most interested in properties of H" . If x is in a decreasingly 
small neighborhood of Xj, the expansion at (5.21) equals 

c/i2[i/(4)(a;.) _ 2{H'{xi)H"'{x,) + H"{xi)^}]+o{h^) = ch^H'^'^\xi) + o{h^), 

the second identity holding since H"{xi) = H"'{xi) = 0. Therefore, if we 
ignore stochastic fluctuations (which are asymptotically equally likely to 
increase or decrease the value of H"), H"{x) is distant at least order h? 
strictly above zero when x is in the neighborhood of Xj. Since h is at least of 
order n^^/^, then the distance of H" above zero, in the neighborhood of Xj, 
is [with probability at least ^ -|- o(l)] no less than a certain fixed constant 
multiple of n~^/'^; call this result (R). 

Let H* denote the bootstrap version of H , and recall from the proof 
of Theorem 3.2 that that limit result derives entirely from fluctuations of 
H{x,y) below zero when x is close to Xj and y is near zero. If H G Hq2, 
these fluctuations occur with a probability that is bounded away from zero 
as n increases. The perturbations of H* — H are of order only n~^/^, and, 
in particular, are of strictly smaller order than This property and 

result (R) imply that the probability that the empirical fluctuations of H* 
near xi,...,Xm ever protrude below zero converges to zero as n^oo. In 
consequence, the limit results described by Theorem 3.2 do not apply in the 
bootstrap setting. 

5.5. Proof of Theorem 3.4. Observe that 
H{x + y) + H{x-y)-2H{x) 

(5.22) 



y2 f\H"{x + ty) + H"{x - ty)}{l - t) dt. 
Jo 



Let Hg denote the version of H that arises if F is replaced by a distribu- 
tion G, and note that, by (2.3) and approximations based, for example, on 
the Komlos, Major and Tusnady [17] embedding, 

(5.23) H" = Hl~^ + (1 - F)-\f' - Ef') + Op{(n/i)''-(V2)}, 
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uniformly in x G J' and in /i € for each rj > 0. The argument in Section 5.4 
[see particularly (5.20)] shows that 

H''~ = H" + c/i2[fW - 2 {H'H'" + {H"f}] + o(/i2) 
uniformly on J' and in h^Ti. Therefore, 

= H"{x) + (^y2 + c/i2)/7(4)(xi) + o(/i2 + y2) 

uniformly in /i € "H, |a; — xi| < 5(n) and \y\ < S{n) for any sequence 5{n) J, 0. 
Furthermore, 



f'{x) - Ef'{x) = j K"{u){F{x - hu) - F{x - hu)} du 

= /i-2n-^/2 j K"{u)[W{F{x - hu)} - W{F{x)}] du 
+ Op{/i-in-i/2(logn)V2}, 

uniformly in h £?{ and x G J', where is a standard Brownian motion. 
Put h = n~^/'^q, X = xi + n~^^'^s + ty and y = n~^/'^z, and recall that g = 
/^/^/(l — F). Then there exists a standard Brownian motion V such that 

/i-2n-^/2{l - F{x)}-^[W{F{x - hu)} - W{F{x)}] 
= n~'^/'^q~'^g{xi){V{s + tz- qu) - V{s + tz)} 
+ Op{n-^/'\logn)'/\\q\ + \s\ + \z\)}, 

uniformly in < t < 1, |ii| < C for any C > and q, s, z such that n~^^'^q G Tl, 
\s\ < v}/'^5{n) and \z\ < n^^'^5{n). Therefore, defining M = (1 — F)~^{f' — 
Ef), we have 

/ {M{x + ty) + M{x-ty)}{l-t)dt 
Jo 

(525) =n-2/V'9(xi) J K"iu)du 

X J {V{s + tz-qu) + V{s-tz-qu)}{l-t)dt 
+ oJn-'/'\logn)'/\\q\ + \q\-' + |.| + \z\)}, 

uniformly in n~^^'^q £ 7Y, \s\ < n^^'^5{n) and \z\ < n^^'^6{n). 
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Combining (5.22)-(5.25), and taking x = xi + n ^^'^s and y = n ^^'^z, we 
deduce that 

n^/^{#(x + y)+ H{x - y) - 2H{x)} 

+ z^q-^g{xi) f K"{u)du 

(5.26) 1 -I 

X / {V{s + tz-qu)+V{s-tz-qu)}{l-t)dt 

+ oJn-^l^\\ognf/^zWq\ + \q\~^ + \s\ + \z\)} 
+ Op{z^{q^ + 3"^ + z^)}, 

uniformly in n~^l'^ q G "H, |s| < n}!'^ biji) and \z\ < n^^'^d{n). The theorem 
follows from (5.26). 

5.6. Proof of Theorem 3.5. Dividing both sides of (5.26) by z"^ and letting 
z ^ 0, we deduce that, when h = /icrit, n?^'^H"{xi + n~^/'^s) = S{s) + Op(l), 
uniformly in \s\ < n^^'^ 6{n), where S{s) is defined as at (3.10). Thus, the 
bootstrap calibration step involves sampling from a distribution whose cu- 
mulative hazard rate H is convex on J and satisfies H"{x) > for all x G J, 
excepting a single point x which may be expressed as x = xi + n~^^'^A + 
Op(n~^/^), where A is uniquely defined by S{A) = 0. At this point H" van- 
ishes. Reworking the proof of Theorem 3.3, we deduce that the critical point 
c{a) of T* , defined conditional on the data X, equals n~^/'^Ta{S" (A) , g{xi)} + 
Op{n~^/'^). [Here, T* denotes the value of T computed from an n-sample 

drawn from the distribution -F(-|/icrit).] 

The distribution of T may be represented, in asymptotic form, as before, 
and in terms of the same Brownian motion W that was used to construct 
the representation for H at (5.26). In particular, the Brownian motion Wi 
appearing at (5.19) (when i = 1) may be taken identical to the process V 
at (5.26). Letting W denote the common process, we see that the inequality 
T < c{a) may equivalently be written as 

(5.27) + Op(n"6/7) < n~^/'r^{S"{A),g{xi)} + Op{n~^/'), 
where Zi is defined by (3.7) with i = 1. Theorem 3.5 follows from (5.27). 

5.7. Proof of Theorem 3.7. If the hazard rate H' is not increasing on J, 
then, for some e > 0, there exists a nondegenerate rectangle TZ such that for 
all {x,y) € TZ, both x + y and x — y lie in J and H{x,y) < —e. Under the 
hypotheses of the theorem, H{x,y) = H {x , y) + Op{l) uniformly in {x,y) 
and so T > e\TZ\ +Op(l), where \TZ\ denotes the area of TZ. Therefore, the 
theorem will follow if we prove that, for each a € (0, 1), the point c(a) derived 
using the bootstrap argument in Section 2.2 satisfies 

(5.28) P{c{a) >ri}^0 for each r]>0. 
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As /i^oo, E{f{x\h)} = h-^K{0)+o{h-^) and E\f{x\h)\ = h~^K"{0) x 
E\x — X\+ o{h~^), uniformly in x S J. Hence, there exists ho > such that 
{Ef{x\ho)}'^ > 2E\f'{x\hQ)\ for all x G J. It may be proved from this prop- 
erty that, with probability converging to 1, /(x|/io)^ > |/'(x|/io)| for all x E J. 
Therefore, by (2.3), the probability that H"{x\ho) > for all x € J converges 
to 1 as n ^ oo , and so 

(5.29) P(/icrit < ho) ^ 1. 

Standard calculations of the expected value of a kernel distribution esti- 
mator show that, under the conditions of the theorem, for each hi> 0, there 
exists e(/ii) > such that, for all sufficiently large n, 

1 - E{F{x\h)} > e{hi) for all x G J and ah h G (0, hi]. 

By employing a stochastic approximation based on the results of Komlos, 
Major and Tusnady [17], it may be proved that, for each hi > 0, 

\F{x\h) - E{F{x\h)}\ = Op{l) uniformly in x G J and h G (0,/ii]. 

Therefore, 

P{1 - F{x\h) > le(/ii) for aU x G J and all h G (0, hi]} 1. 
This result and (5.29) imply that 

(5.30) P{1 - F{x\hcrit) > ^e{ho) for all x G J} ^ 1. 

If Fj^ denotes the bootstrap version of F, computed from an n-sample 
drawn from the distribution F{-\h) rather than from F, then for all A > 0, 

sup sup E{\F^{x)-F{x\h)\^} = 0{n~^/'^) 

x& /iG{0,/ii] 

for all A > 0. (The method of proof involves only direct calculation of mo- 
ments, first conditional on the data and then unconditionally.) Therefore, 
if ^1 and A2 are subsets of J and [Cn~^/^, /ii], respectively, each of which 
contains no more than 0(rP) elements, then for each A > and by Holder's 
inequality. 



e\ sup sup |F^*(x) -F(x|/i)|| 

< 



Y: e Em{x)-F{x\ht} 

xi^Ai heA2 
|Q(^2D-(A/2))}l/A 

^0(,^{2D/A)-(l/2))_ 



1/A 

At 
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Since D/\ can be made arbitrarily small by choosing A sufficiently large, 
then we have proved that, for each > 0, and each choice of Ai and A2 with 
only polynomially many elements, 

e\ sup sup \F^{x) - F{x\h)\\ = 0(n''-(^/2))_ 

Using this property, and the fact that K is Holder continuous, it may be 
shown by a "continuity argument" (see, e.g., [8]) that 



^<^sup sup \F^{x)-F{x\h)\\ = o{l) 

for any C > 0. It follows that if /i is a random element of the interval [Cn~^/^, hi] 

(5.31) e\ sup|Fr*(x) - F{x\h)\\ = 0(1). 

I x& J 

Write simply F* for F-* , and put H* = - log(l - F*) and H = - log(l - 

'^crit 

F). Taking hi = h^ and h = /icrit) which in view of (5.29) and the assumptions 
in the theorem satisfies P(Cn~^/^ < hcnt ^ ^0) ~^ 1 for some C > 0, we 
deduce from (5.31) that \F* — F{-\hcTit) \ = Op(l) uniformly on J. Prom this 
result and (5.30), we see that 

(5.32) snp\H*{x)-H{x\krit)\=R*i, 

x& 

where, here and below, i?* denotes a random variable that is defined through 
Monte Carlo simulation conditional on X and satisfies P(|i?j| > 77) — > for 
each > 0, where the probability is defined in the unconditional sense. 
If T* denotes the bootstrap version of T, then 

r*= y"y" max{0,2#*(3;) -^*(x + y)-F*(x-y)}dxdy 

x,y:x^y,x-y& 

= jj \Tiax{Q,2H{x\hcnt) - H{x + y\hcr\\) 

x,y:x+y,x-y& 

- H{x - y\hcy:\t)}dx dy + Rl 

TJ* 

— -K2, 

where the second identity follows from (5.32) and the third from the fact 
that, by the definition of /icrit, -ff('l^crit) is convex on J. Therefore, P{T* > 
r]) —f for each 77 > 0. Hence, since c(a) is defined by P{T* > c(a)|<Y} = a, 
then (5.28) must hold. 
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